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Abstract 

One possible method for predicting landfalling hurricane numbers is to first predict the number 
of hurricanes in the basin and then convert that prediction to a prediction of landfalling hurricane 
numbers using an estimated proportion. Should this work better than just predicting landfalling 
hurricane numbers directly? We perform a basic statistical analysis of this question in the context 
of a simple abstract model, and convert some previous predictions of basin numbers into landfalling 
numbers. 



1 Introduction 

We are interested in trying to develop and compare methods for the prediction of the distribution of the 
number of hurricanes that might make landfall in the US in future years. One class of possible methods 
that one might use involves first predicting the number of hurricanes in the Atlantic basin, and then 
converting that prediction to a prediction of landfalling numbers using some estimate of the proportion 
that might make landfall. Is this class of indirect methods likely to work any better than simpler methods 
based on predicting the number of landfalls directly? On the one hand, the direct methods avoid having 
to make any estimate of the way that basin hurricanes relate to landfalling hurricanes. On the other, there 
are more hurricanes in the basin than at landfall and so it might be possible to predict basin numbers 
more accurately than landfalling numbers (in some sense), and this accuracy might then feed through 
into the landfall prediction. 

In order to try and understand the relationship between these two methods a little better, we investigate 
some of basic statistical properties of the direct and indirect methods for predicting future hurricane 
rates. 

In section [5] we present some basic statistical ideas that we will use in our analysis. In section [3] we 
set up the problem and derive expressions for the likely performance of the indirect method in a general 
context. In section|4]we consider the performance of a set of simple prediction methods for basin hurricane 
numbers. In section O we specialize our analysis to the case where the basin hurricane numbers are 
poisson distributed. In section[S]we perform some Monte-Carlo simulations to check our approximations. 
In sections [7] and [5] we apply the indirect method to make predictions of the n umber of landfalling 
hurricanes, based on the basin hurricane number predictions of Binter et al. ( 20061 ). Finally in section [9] 
we discuss our results. 



2 Background on conditioning 

In this section we present some standard statistical results that we will use later. 
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2.1 Basic definitions 

Consider two random variables X and Y with joint density fx,Y and marginals fx and fy ■ The density 
of Y\(X = x) is defined as 

fy\x{y\x) = fx f\ X \ y) where f x (x) ± (1) 
The conditional expectation is defined as E(Y\X) = ip(X) where 

tJj(x) = E(Y\X = x)= f yfy\x(y\x)dy. (2) 

JR. 

The conditional variance is defined as var(Y|X) = v{X) where 

v{x) = var(Y\X = x)= f [y - E(Y\X = x)} 2 f Y \x{v\x)dy. (3) 

JR 

2.2 Disaggregation of the variance 

From the definitions given above one can derive a useful expression that disaggregates the variance of Y 
into conditional expectations and variances. 

vav{Y\X) = E(Y 2 \X) - [E{Y\X)} 2 , (4) 

and 

var(Y) = E[var(Y\X)\ + vax[E(Y\X)]. (5) 

2.3 Disaggregation of the variance of a product 

From equation [5] we can then derive a useful method for disaggregating the variance of a product. 
First, it is always true that 



var(XY) = E[var(XY\X)} + var[E(XY\X)} 
= E[X 2 var{Y\X)} +y-ay[XE(Y\X)]. 



2,_/vlYM , ,„jyp/vl^l ( 6 ) 



Now, if X and Y are independent we have E(Y\X) = E(Y) and var(Y\X) = var(Y) so 
var{XY) = E(X 2 )var(Y) + E(Y) 2 var(X) 



2™,^^ , ^2^y, (7) 



= var(X)var(F) + £(X) 2 var(y) + E(Y) 2 vai{X) 
We will use these expressions below. 



3 Basics of the conditional binomial model 

We now set up our model. Overall our approach is to start with a very general mathematical framework 
(e.g. we don't initially assume that hurricane numbers are poisson distributed), derive what we can with 
this level of generality, and make additional assumptions on the way through as and when necessary. 
First, we need random variables for the annual numbers of hurricanes in the basin and at landfall, and 
their historical totals. We define these as follows: 

• Let {X t : t = 1, . . . , n} be the sequence of annual historical hurricane numbers and let X = X)t=i Xt- 

• Let {Y t : t = 1, . . . , n} be the sequence of annual historical landf ailing hurricane numbers and let 

Now we consider estimating the proportion of hurricanes that make landfall, and the properties of the 
most obvious estimator of that proportion. To start with, we don't assume that the number of hurricanes 
in the basin is poisson, but we do assume that the probability of hurricanes making landfall is constant 
in time, and is the same for all hurricanes. We write this (unknown) probability as p. Then the number 
of hurricanes that make landfall in a given year, given the number in the basin, is given by a binomial 
distribution: 

Y t \X t ~binomial(Xt,p) (8) 



A useful analogy is that each basin hurricane is a coin toss, with a probability p of giving a head. The 
number of hurricanes making landfall Y t is the number of heads in X t tosses. 
Extending this to the total number making landfall over n years, we also get a binomial: 

Y\X 1 ,...,X n ~bmomial(X,p) (9) 
3.1 Estimating the landfall proportion 

The most obvious way to try and estimate p from the historical data is using the simple ratio of the total 
number of historical landfalls to the total number of basin hurricanes: 

p = Y/X (10) 

What are the properties of this estimator? Is it unbiased, and what is the variance? 
Wrt bias, first we note that: 

E{p\X 1 ,...,X n )=p (11) 

and that 



E{p) = E{E{p\X 1 ,...,X n )) (12) 

= E(P) (13) 
= P (14) 

and we see p is unbiased. 

Wrt variance, a standard result for the binomial distribution is that: 

\ax(p\X 1 ,...,X n )=p{l-p)/X (15) 
Using equation [51 we can then decompose var(p) as follows: 

var(p) = E[vax(p\X 1 ,...,X n )]+vax[E(p\X u ...,X n )] (16) 
= p(l-p)E(l/X). (17) 

That is, the variance of the estimate of the proportion is given only in terms of the proportion itself and 
E(l/X). The proportion can be estimated using a plug-in estimator, but the E(l/X) factor is slightly 
harder to deal with, and can only be evaluated once we have settled on a distribution for X. We consider 
this for the poisson distribution in section [5] below. 

3.2 Landfall predictions 

Now we consider making predictions of future landfalling hurricane numbers using the estimated propor- 
tion p, and a prediction of the mean number of basin hurricanes, which we write as /i = E{X n+ i). The 
first question is then how to estimate fi. One fairly general class of methods for estimating fi is to use the 
historical data for the basin number of hurricanes in some way. We can write this as p, — g{X\, . . . , X n ), 
where g could be a linear or non-linear function of the historical data. 

The most obvious reasonable forecast for the number of hurricanes making landfall is then pjl. What are 

the properties of this particular method? 

We can establish the properties of this predictor as follows. 

For the bias: 

E(pji) = E(E(pfi\X 1 ,...,X n )) (18) 
= E(ftE(p\X 1 ,...,X n )) (19) 
= pE&) (20) 

Note that if jl is unbiased for E(X n+ i) then equation l20l implies that pjl is unbiased for E(Y n+ i) (this is 
a stronger result than asymptotic unbiasedness) . 
For the variance: 

var(p£) = E(var(pfX\X 1 ,...,X n ))+var(E(pfi\X 1 ,...,X n )) (21) 

= E{{i 2 va,r(p\X 1 ,...,X n ))+vM(fiE(p\X 1 ,...,X n )) (22) 

= E(fi 2 p(l-p)/X)+p 2 v^(p) (23) 

= p{l-p)E{fi 2 /X)+p 2 Y^(fi) (24) 



We consider various approximations to this expression in the next two sections, which will allow us to 
evaluate it in certain situations. 



4 Linear predictors of basin hurricane numbers 

We now move on to consider linear predictors of the number of hurricanes in the basin i.e. methods that 
use a weighted sum of historic values as an estimator of \x. 
We write this as: 

n 

fi = J2 w i X i- (25) 



This linear framework includes the mixed baseline models of iJewson et al.l (|2005l ). and models that use 
linear regression of hurricane numbers on sea surface temperature. 

To account for climate variability, the weights may be chosen to generate an estimator that uses only 
recent data. For example: 

f 0, for i = I, ...,n — m, fn „, 
Wi = < i . (26) 
— , tor i = n — m+l,...,n. 

Under this model it may, in some cases, be reasonable to suppose that jl is generated so that cov(/x 2 , 1/A) 
is small relative to E(fi 2 )E(l/X). Roughly speaking, this occurs if the errors we make when estimating 
the proportion are not highly correlated with the errors we make when making the basin prediction. 
If we can assume that the covariance term is small then we can make some useful simplifications to 
equation [24] as follows: 

var(p/t) = p(l ~p)E{fj, 2 /X)+p 2 var(fi) (27) 
= p(l-p)[S(A 2 )i?(l/X) + cov(A 2 ,l/X)]+p 2 var(A) (28) 
« Kl-p)£(£ 2 )£(l/X)+Aar(£) (29) 



5 Poisson model for basin hurricanes 

We now specialize our analysis to the case where the number of hurricanes in the basin can be modelled 
as a poisson distribution, which allows to approximate the E(l/X) term, and hence evaluate equations 1 171 
and [291 

We start by assuming that the annual counts are poisson distributed, with the same poisson mean in 
each year: 

X t ~ poisson(/^) for all t (30) 
Then the total number of hurricanes over n years is also poisson distributed: 

X ~ poisson(n/i) (31) 

(statisticians usually prove this by inspection of moment generating functions). 

At this point we briefly mention a small mathematical problem, which is that we are now going to 
consider 1/X, even though X, being poisson distributed, can take values of 0. To get around this 
problem rigourously one can condition on X > 0, which would introduce a small adjustment factor 
to the expressions derived below. We will, however, ignore this. Effectively we are assuming that the 
probability of X being zero is small, and this should be borne in mind when applying the results we 
derive. This should be a reasonable assumption if X is the number of Atlantic basin hurricanes, but 
would not reasonable if X we the number of category 5 Atlantic basin hurricanes, for instance. 
Our approximation for E(l/X) is based on a Taylor expansion for the annual numbers: 



S(l/X t ) = i[l + i + 2^ + 0(^)j (32) 
=* m/X) = ±[l + ± + 2j^ + 0(^)] (33) 
Thus, to first order, E(l/X) » 

If we take this first order approximation and substitute it into equation [T7] then we get: 

var(p) « rfizd (34) 



And if we substitute it into equation (55] we get 

varm/i) w hp varui). (do) 

One simple prediction method for the mean number of hurricanes in the basin is to take a straight average 
of m years of data. Given this, 

var(/i) = jji/m (36) 

and 

E(p 2 ) = fi/m + fi 2 (37) 
= ju(l + m/x)/m (38) 

r~-\ _ p(l - p)0- + mfi,) 2 fi , . 

var(p/i) w hp — . (39) 

tim m 

How accurate are these results based on the first-order approximations? They will be reasonable if n is 
large. Better approximations to var(p) and var(p/t) can easily be generated by using higher order terms 
in the approximation of E(l/X). 



In this case we get: 



6 Simulation tests 

We now test the first order approximation using Monte-Carlo simulations. We consider the following 
situation: 

• We estimate the mean number of hurricanes making landfalling using just the last 11 years of 
landfalling data. This is one of our predictions. 

• We estimate the mean number of basin hurricanes using the same 11 years of data 

• We convert the basin estimate to an estimate for landfalling numbers using an estimated proportion, 
which is based on between 11 and over 50 years of data. 11 of the years of data used to estimate 
the proportion are the same data that is used to estimate the rates. 

• We estimate the variances of all these predictions 

Using Monte-Carlo simulations we can compare the variance estimate given by equation [35] with the real 
variance estimates. The results are given in figure [TJ The black-line gives the variance of the landfall 
prediction based on 11 years of historical landfall data, from equation[3Sl The black-dots give estimates of 
this variance based on the simulations. The blue-line gives our theoretical approximation to the variance 
from the indirect method, based on equation [39j The coloured dots give estimates of the variance from 
the indirect method based on the simulations. We see that: 

• The theoretical estimate of the variance for the indirect method is in very good agreement with 
the results from the simulations, even though we've only used a first order approximation to derive 
equation [3jjl 

• The variance of the indirect method is lower than the variance of the direct method when the 
proportion is estimated using more years of data than are being used for the rate estimates. Using 
35 or more years of data makes the indirect method more than twice as accurate, in terms of 
variance. 



7 Applying the indirect method 



We now make some predictions of future n umbers of landfallin g hurricane numbers by converting the 
basin hurricane number predictions given in iBinter et al. I (|2006l) to landfalling predictions. 



7.1 Step 1: predicting numbers of basin hurricanes 



The predictions of numbers of basin hurricanes that we use are taken from lBinter et al.l ([2006), in which 



mixed baseline models are used to predict future numbers of hurricanes in the basin. These models are 
based on an analysis of change-points in the historical time-series of hurricane numbers. The intervals 
between change-points are taken as periods of levels of constant hurricane activity, and future activity 
is then predicted on the assumption that the current level of activity will continue. The prediction is 
given by an optimal combination of the observed activity rates in the historical data, where 'optimal' is 
defined as minimising mean-square-error, and trades off the need to use as much of the historical data as 
possible (for increased accuracy) against the desire to use only recent data (because it is likely to be the 
most relevant for the future). 

The predictions from Binter et all ( 200(1 ) are based on the change-point analysis of lElsner et al.l ( 2000h 



and Jewson and Penzerl (|2006 ). We include predictions based on both of these change-point analyses 



to get an idea of the level of sensitivity of the results to the details of the methods used to detect the 
change-points. 

7.2 Step 2: relating basin hurricane numbers to landfalling hurricane num- 
bers 

The empirical relationships we use to convert the number of basin hurricanes to a number of landfalling 
hurricanes are simple estimates of the probability that hurricanes will make landfall, based on historical 
hurricane data for 1950 to 2005. For cat 1-5 hurricanes, we estimate this probability to be 0.254 (with a 
standard error of 0.058). For cat 3-5 hurricanes, we estimate this probability to be 0.240 (with a standard 
error of 0.057). 

7.3 Step 3: converting basin predictions to landfalling predictions 

The predictions used in step 1 above produce estimates for the mean number of basin hurricanes, the 
variance of the number of basin hurricanes, and the standard error on the mean. The empirical relation- 
ships in step 2 tell us how to convert the number of hurricanes in the basin into the number at landfall, 
given information about the number in the basin. How, then, should we combine this information to tell 
us about the distribution of the number of hurricanes at landfall? A complete solution for the distribu- 
tion of the number of hurricanes at landfall would be slightly complicated to derive. The mixed baseline 
models themselves don't give a probabilistic prediction, but just the first two moments and the standard 
error on the mean. Although they are built on the assumption that the number of hurricanes is poisson 
distributed, the predictions they produce cannot strictly be interpreted as poisson distributions because 
the mean and variance are not equal. However, deriving expressions for the mean, variance, and standard 
error on the mean for the number of landfalling hur ricanes, which is all we are interested in, is rather 



simple. These expressions are given in I Jewsonl (J2007) 



Putting all of this together, we make predictions for: 

• The number of landfalling cat 1-5 hurricanes, based on the basin number of cat 1-5 hurricanes 

• The number of landfalling cat 3-5 hurricanes, based on the basin number of cat 3-5 hurricanes 

In each case we predict the mean number of hurricanes, the variance of the number of hurricanes, and 
the standard error on the mean (which is based on both the standard error on the prediction of the basin 
number of hurricanes, and the standard error of the estimate of the proportion making landfall). 



8 Predictions from the indirect method 



8.1 Predictions based on Eisner change points 



The results from our analysis based on the change-points from lElsner et all 



are shown in table 



1. The first four rows of this table are for cat 1-5 storms^ while the second four rows are for cat 3-5 



storms. As an example, consider the first row. From iBinter et all ( 2006h . table 5, we can see that the 
short baseline model predicts 8.45 hurricanes in the basin, with a standard error of 0.877. Converting 
that to a prediction of landfalling hurricanes using the estimated probability of landfall of 0.254 gives 2.15 
hurricanes, which is the value for the mean shown in the first row of table 1. Similarly the variance of the 
number of landfalls in this case is 2.74. The standard error, which arises because of (a) the uncertainty 



in the prediction of the number of storms in the basin and (b) the uncertainty in how to convert that 
number to a prediction at landfall, is 0.549. 

How do t hese new prediction s for the mean number of landfalling hurricanes compare with the previous 
results in Binter et al. I (|2006h ? Considering the most complex model in each case (model 4 in table 1), the 



prediction for cat 1-5 storms changes from 2.09 to 2.08, which is insignificant. For cat 3-5 (model 8 in table 
1), however, the prediction changes from 0.82 to 0.92. This is a more significant change (although still 
well within the standard error estimates) . What is driving this increase? It turns out that the percentage 
increase in the basin number of severe storms that we have seen in the last 11 years is rather larger than 
the percentage increase in the number of severe storms at landfall (basin severe storms numbers have 
increased by 88% relative to the long-term baseline, while landfalling severe storms have only increased 
by 40%). This increase in the basin severe storms leads to a high prediction of the future number of 
severe storms in the basin, and this in turn leads to a high prediction of the number of severe storms at 
landfall when using this method of predicting landfall numbers from the predicted basin numbers. As 
discussed in the introduction, there are good reasons to think this might be a more accurate prediction 
than the lower prediction based on the landfall data alone, since the landfall data is so sparse. 

8.2 Predictions based on RMS change points 

The results from our analysis based on the change-points from lO'Shav and Jewsonl ( 2007 ) are shown in 



table 2. Once again, the first four rows of this table are for cat 1-5 storms, while the second four rows 
are for cat 3-5 storms. 

We see a small decrease in the prediction of the number of cat 1-5 storms, and another, although smaller, 
increase in the number of cat 3-5 storms. 



9 Conclusions 

One possible way to predict landfalling hurricane numbers is to first predict basin hurricane numbers and 
then convert the basin numbers to landfall using an estimate of the proportion of the basin hurricanes 
that make landfall. This method can be compared with the simpler method of just predicting landfall 
numbers directly. We have performed some statistical analysis of these methods, to try and understand 
which is likely to be more accurate. In particular we have considered a situation where the direct method 
consists of estimating the landfall rates using an 11 year average of historical landfalling rates, and the 
indirect method consists of estimating basin rates using an 11 year average and then converting that to 
landfall rates using a proportion based on more than 11 years of data. Assuming that the probability of 
individual hurricanes making landfall is constant in time then we have shown that the indirect method 
is more accurate, and the more data is used to estimate the proportion, the more accurate it becomes 
relative to the indirect method. Furthermore we have derived expressions for the variance of the indirect 
method, and using simulations have shown that a simple analytic expression for the variance of the 
indirect method works well. 

We then apply the indirect method to convert some previous predictions of basin hurricane numbers into 
predictions of numbers of landfalls. The results for landfalling cat 1-5 storms are not that different between 
this method and results from predicting landfalling storm numbers directly from historical landfalls. The 
results for cat 3-5 storms, however, show higher predictions. This is because the number of cat 3-5 storms 
in the basin has increased more in recent years (proportionately) than the number of cat 3-5 storms at 
landfall. 

Preliminary results (as yet unpublished) suggest that the hypothesis that the probability of storms making 
landfall doesn't change in time cannot be rejected. This lends weight to the idea that these higher 
predictions of future numbers of intense landfalling storms may be more reliable. However, the difference 
between the two predictions is well within the standard error estimates, and so either prediction could 
easily have been much higher or lower just due to random effects. 
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Figure 1: Variances from analytic expressions and Monte Carlo simulations. The black line shows the 
variance of the direct landfall prediction, based on 11 years of data. The grey line shows an estimate of 
the variance of the indirect landfall prediction, based on 11 years of basin data, and N2 years of basin 
data, using equation 1391 The black circles show simulation-based estimates of the variance of the direct 
prediction, and the grey circles show simulation-based estimates of the variance of the indirect prediction. 
The simulations validate the approximations used to derive equation 1391 
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Table 1: Predictions for landfalling hurricane numbers, based on the change points from lElsner et al 
(|2000h . and the method described in the text. 
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Table 2: Predictions for landfalling hurricane numbers, based on the change points from iBinter et al 
(|2006h . and the method described in the text. 



